Introduction to ggplot2

Gabriela Palomo & Dan Herrera

Instructors

Learning objectives

After today’s lecture, you’ll be able to:

  • Understand the basic syntax of ggplot.
  • Create basic plots: bar, points, lines, boxplots, error bars, etc.
  • Create color palettes and use colors effectively: qualitative, sequential, and diverging palettes.
  • Customize the theme of a plot.

Grammar of Graphics

  • gglot2 is an R package for creating graphics.

  • Created by Hadley Wickham and is considered to be part of the tidyverse.

  • Compose graphs by combining independent components: versatile!

  • If you learn the grammar then you will end up creating better graphics in less time.

Data structure

  • Wide format
species 2007 2008 2009
Adelie 3750 NA NA
Adelie 3800 NA NA
Adelie 3250 NA NA
Adelie NA NA NA
Adelie 3450 NA NA
Adelie 3650 NA NA
Adelie 3625 NA NA
Adelie 4675 NA NA
Adelie 3475 NA NA
Adelie 4250 NA NA
  • Long format
species year island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie 2007 Torgersen 39.1 18.7 181 3750 male
Adelie 2007 Torgersen 39.5 17.4 186 3800 female
Adelie 2007 Torgersen 40.3 18.0 195 3250 female
Adelie 2007 Torgersen NA NA NA NA NA
Adelie 2007 Torgersen 36.7 19.3 193 3450 female
Adelie 2007 Torgersen 39.3 20.6 190 3650 male
Adelie 2007 Torgersen 38.9 17.8 181 3625 female
Adelie 2007 Torgersen 39.2 19.6 195 4675 male
Adelie 2007 Torgersen 34.1 18.1 193 3475 NA
Adelie 2007 Torgersen 42.0 20.2 190 4250 NA
  • Long format data.

  • Each row is an observation point and each column is a variable.

  • Data wrangle BEFORE you graph: tidyr::pivot_longer()

ggplot2

Mapping components

  • 6 main building blocks, each with their own arguments.
ggplot(data = data, mapping = aes(x = x, y = y)) +
  geom_*( ) + # geometries: e.g., geom_point(), geom_bar(), ...
  coord_*( ) + 
  facet_*( ) + # dividing your data into facets: facet_grid() and facet_wrap()
  scale_*( ) + # controls visual values: colors, fills, shapes. E.g., scale_manual().
  theme_*( )   # Controls the overall appearence of the plot: fonts, font size, etc.  

Always begin with ggplot()

ggplot(data = penguins,
       mapping = aes(x = body_mass_g, 
                     y = flipper_length_mm)) 

  • ggplot(): graphing space.

  • data : data frame or tibble in long format.

    • reference object for all subsequent arguments and functions.
  • aes() : defines the axes and uses column names.

Geometries or geom_*

ggplot(data = data) +
  geom_*(aes(x = x, 
             y = y, 
             color = z, 
             fill = f, 
             shape = w, 
             linetype = q), 
         color = color, # points, lines, error bars
         shape = shape, # see pch numbers 
         linetype = linetype, # number, dotted line, dashed line...
         fill = fill, # bars, columns, boxplots, violins
         alpha=0.3, # transparency 
         shape = pch, # change the point shape; this is a number or vector of numbers
         position = position_dodge() # bar plots are not stacked
  ) 

More on geometries

  • aes() inside ggplot() will be included with all the geometries used in a plot.
  • aes() inside every geom will be included with only that geom. This means that sometimes you might need to specify different or additional aes() when combining different geoms in one plot.
  • Static arguments outside the aes(): color, fill, shape, alpha (transparency, 0-1), position, size, or linewidth.

Shapes and lines

  • geom_point(shape = shape) can be specified using any of the pch numbers.
  • geom_line(linetype = linetype) can be specified with either an integer (0-6), a name (0 = blank, 1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash.

Choosing the right visualization

Different types of geometries

Change visual values using scale_*()

  • Lets you change the visual values of a group aesthetic: colors, fills, shapes (scale_manual).
  • Discrete and continuous scales.
  • Predetermined color palettes: ggthemes::scale_color_colorblind()
  • Use xlab('x-axis title') or ylab('y-axis title') or ggtitle('title)
  • labs(title, subtitle, caption, alt)
  • Change x- or y- limits by using x_lims(c(0,1))
  • Find more info here

scale_*()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island), size=3)+
  ggthemes::scale_color_colorblind()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 shape=island), size = 3.5)+
  scale_shape_discrete() #up to 6 discernible shapes

Change color palettes scale_*()

scale_*() functions can modify:

  • Position via scale_x_*() or scale_y_*()

  • Colors via scale_color_*() and scale_fill_*()

  • Transparency via scale_alpha_*()

  • Sizes via scale_size_*()

  • Shapes via scale_shape_*()

    • * can take the following forms:
      • axes: continuous, discrete, reverse, log10, sqrt, date, time.
      • Colors & fill: continuous, discrete, manual, gradient, hue, brewer.
      • Transparency: continuous, discrete, manual, ordinal, identity, date.
      • Sizes: continuous, discrete, manual, ordinal, identity, area, date.
      • Shapes and line types: continuous, discrete, manual, ordinal, identity.

Color palettes

Color palette types:

  • Generally, there are 3 types of palettes:

    • Sequential: data that goes from low to high.
    • Diverging: put equal emphasis on mid-range values and extremes.
    • Qualitative: best for categorical data. Visual differences are given by hues.

Divide a plot using facet_wrap() and facet_grid()

Divide a plot using facet_*()

  • We have two options facet_wrap() and facet_grid().
  • Facets divide a plot into subplots based on a variable in the dataset.
  • Allows for comparison across groups.
ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  facet_wrap(~island)

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  facet_grid(species~island)

The look of the graph theme_*()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  theme_classic()

  • Modifies the overall visual defaults of a plot: titles, background color, gridlines, legends,

  • theme() and theme_*().

    • theme will help you customize and personalize the overall look of your plot.
    • You can start with a predefined theme and then customize it with theme_*.
  • theme() will include element_* functions to modify different areas.

Predefined ggplot2 themes

  • Predefined ggplot2 themes: theme_classic(), theme_gray(), theme_bw(), theme_linedraw(), theme_light(), theme_dark(), theme_minimal(), theme_void()

Before we learn about modifying the theme()

Modify elements in the theme()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  theme(plot.background = element_rect(colour = 'green', fill = 'gray80'), 
        panel.background = element_rect(colour = 'orange', size = 3, fill = 'pink'),
        panel.grid.major = element_line(color = 'blue', size = 2), 
        legend.position = 'bottom', 
        axis.title = element_text(size = 20))

Useful cheatsheets for theme

Let’s remember the mapping components

ggplot(data = data, mapping = aes(x = x, y = y)) +
  geom_*( ) + # geometries: e.g., geom_point(), geom_bar(), ...
  facet_*( ) + # dividing your data into facets: facet_grid() and facet_wrap()
  scale_*( ) + # controls visual values: colors, fills, shapes. E.g., scale_manual().
  theme_*( )   # Controls the overall appearence of the plot: fonts, font size, etc.  

End